from IPython.display import HTML
HTML('''<button type="button" class="btn btn-outline-danger" onclick="codeToggle();">Toggle Code</button>''')
India is known for its diversity. Be it in cultures, climates, languages, or even food. A plethora of dishes are made to satisfy the hunger of 138 crore stomachs. Some of them are seasonal, some come and go at festivals, some of them are made from imported materials, some of them are exported, some are eaten by only a particular community, some are forbidden in some religions, and what we eat is just the tip of the iceberg compared to what the whole of India eats.
We decided to analyse the food trends in India. The data on which we built our analysis is not food consumption data; instead, it represents the popularity of a food item. Using Google Trends , we have the search frequency of multiple dishes and raw materials over the years of 2019 to 2021. Within an error margin, food search data is closely similar to food consumption data. Google Trends also gives the search frequency distributed over all states of India.
Our data needed quite a bit of pre-processing. Apparently, Google Trends only gives relative data, not absolute, which meant that all the food items' popularity was capped at 100. However, a very neat feature is that we're allowed to compare up to 5 items together. So we agreed on a 'baseline term', which will be searched along with individual food items. We can then download the csv files of the data, which has the (relative) amount of searches per week in our time range. These values are numerical, and we scaled each of them accordingly, so that the baseline term has the same values throughout. The baseline term that we have used is 'Drinking water'.
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
from matplotlib.animation import FuncAnimation
from matplotlib import animation
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'notebook'
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected=True)
rc('animation', html='jshtml')
frn = 10 # Number of frames to process in the animation
fps = 0.5 # Frames per second
mywriter = animation.PillowWriter(fps=fps)
Here's what our data for mango looks like:
# Data Mango
mango_df = pd.read_csv("./data/mango.csv")
mango_df.head(10)
| Week | mango: (India) | |
|---|---|---|
| 0 | 2019-01-06 | 10 |
| 1 | 2019-01-13 | 10 |
| 2 | 2019-01-20 | 11 |
| 3 | 2019-01-27 | 10 |
| 4 | 2019-02-03 | 12 |
| 5 | 2019-02-10 | 12 |
| 6 | 2019-02-17 | 12 |
| 7 | 2019-02-24 | 11 |
| 8 | 2019-03-03 | 13 |
| 9 | 2019-03-10 | 14 |
And here's what it looks like on a timeline:
# Mango timeline
fig = px.line(mango_df, x='Week', y='mango: (India)', labels={
"Week": "Timeline",
"mango: (India)": "Mango"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
From this graph, we observe that the interest in mangoes is significantly higher in the months of May-June, which is the peak mango season.
This data also shows if the food item was popular for a reason other than seasons and festivals. Some notable examples are shown below:
# Timeline
dragon_fruit_df = pd.read_csv("./data/dragon_fruit.csv")
fig = px.line(dragon_fruit_df, x='Week', y='dragon fruit: (India)', labels={
"Week": "Timeline",
"dragon fruit: (India)": "Dragon Fruit"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
Search data of dragon fruit in India shows a distinctive peak in January 2021. This peak was due to the renaming of dragon fruit to 'kamalam' by the Gujarat Government. The distinctive shape and colour of the dragon fruit makes it look similar to a lotus, which is called kamal in Hindi. This was statewide news, and we're guessing that this made people search for dragon fruit. ( source )
# Timeline
chocolate_df = pd.read_csv("./data/chocolate.csv")
fig = px.line(chocolate_df, x='Week', y='chocolate: (India)', labels={
"Week": "Timeline",
"chocolate: (India)": "Chocolate"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
This one is a very cute example of how festivals also have an impact. Chocolate peaks exclusively in the second week of February across all years. This is when Valentine's Day is celebrated, and is the most contributing factor to chocolate popularity in February.
# Timeline
dalgona_candy_df = pd.read_csv("./data/dalgona_candy.csv")
fig = px.line(dalgona_candy_df, x='Week', y='dalgona candy: (India)', labels={
"Week": "Timeline",
"dalgona candy: (India)": "Dalgona candy"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
Another interesting example is where we can see a spike in the searches for Dalgona Candy during September 2021 when the Korean drama series Squid Game was released. One of the challenges in it included cutting the Dalgona Candy without breaking it in a fixed time. So, it became famous after the series was a success.
# Timeline
undhiyu_df = pd.read_csv("./data/undhiyu.csv")
fig = px.line(undhiyu_df, x='Week', y='Undhiyu: (India)', labels={
"Week": "Timeline",
"Undhiyu: (India)": "Undhiyu"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
Undhiyu is a seasonal Gujrathi dish made in the winter. It is made of many winter vegetables and is being cooked in almost every household in Gujarat during Makarsankranti (January 14) which resembles the peaks in the graph.
# Timeline
haleem_df = pd.read_csv("./data/haleem.csv")
fig = px.line(haleem_df, x='Week', y='Haleem: (India)', labels={
"Week": "Timeline",
"Haleem: (India)": "Haleem"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
Haleem is a dish that is mainly prepared during the Ramadan month of the Muslim Hijri calendar in Hyderabad, Telangana in India. It has a significant cultural history among Muslims and is very famous in the Middle East, and the Indian subcontinent. As we can see in the plot above the spikes align with the Ramadan months of those years. Haleem originated from Harees which was introduced to the Hyderabad Nizam’s army by Arab soldiers, which over time changed to Haleem. Haleem from Hyderabad is transported all over the country.
A search term not completely unrelated to food, 'homemade' had a ginormous peak in April-May of 2020, when the first lockdown due to COVID was initiated:
# Timeline
homemade_df = pd.read_csv("./data/homemade.csv")
fig = px.line(homemade_df, x='Week', y='homemade: (India)', labels={
"Week": "Timeline",
"homemade: (India)": "Homemade"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
This corresponds, with much likelihood, to the fact that people were now looking for recipes of their favourite foods that they can make at home, since restaurants and hotels had to be closed down.
Similarly, we figured, with a worldwide pandemic affecting our daily lives, it must have also affected the food in our daily lives. With this data, we plan to also gain some insight as to how COVID affected food searches. Will it be more than usual or less? And why? We decided to make a line chart of some of the items in our dataset.
# Timeline
covid_slope_df = pd.read_csv("./data/covid_slope_chart.csv")
fig = px.line(covid_slope_df, x='Week', y=covid_slope_df.columns, labels={
"Week": "Timeline",
"homemade: (India)": "Homemade"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
We also made slope chart of three year's data to observe the effect of COVID.
# Script to get yearly data
# chocolate_df = pd.read_csv("./data/covid_slope_chart.csv")
# data = [[2019, 0], [2020, 0], [2021, 0]]
# for index, row in chocolate_df.iterrows():
# date_str = row['Week']
# item_str = 'drinking water: (India)'
# if date_str.startswith('2019'):
# data[0][1] += row[item_str]
# if date_str.startswith('2020'):
# data[1][1] += row[item_str]
# if date_str.startswith('2021'):
# data[2][1] += row[item_str]
# print(str(data[0][1]) + ',' + str(data[1][1]) + ',' + str(data[2][1]))
data = [
['momos',382,876,596],
['jalebi',1244,1305,836],
['panipuri',138,355,248],
['samosa',870,1484,877],
['drinking water',581,832,765],
['bottom',0,0,0],
['top',1500,1500,1500]
]
# chocolate_df_2 = pd.DataFrame(data, columns = ['Year', 'Amount'])
slope_df = pd.DataFrame(data, columns = ['name', '2019', '2020', '2021'])
# Create the slope chart:
fig = px.parallel_coordinates(
slope_df,
# color= "Unnamed: 0",
# labels={"Amount": "Amount", "Year": "Year"},
labels = {"name": "name", "2019": "2019", "2020": "2020", "2021": "2021"},
color_continuous_scale=px.colors.diverging.Tealrose,
color_continuous_midpoint=2, width=500)
# Hide the color scale that is useless in this case
fig.update_layout(coloraxis_showscale=False)
# Show the plot
fig.show()
In the first 1-2 months of the COVID outbreak in India, when there was a lockdown all over the country there was a spike in the number of searches for the recipes of many food items which aren’t typically made at home. Food items like momos, jalebi, dhokla, panipuri, samosa, and cake were not available outside and people stuck in their homes were interested in making them to satisfy their taste buds. As we can see in the plot above there is a spike in all of the food items in April 2020. Then as local shops opened and people became negligent in the later COVID waves we can infer that people didn’t need to search for recipes online.
import json
india_geodata = json.load(open("data/states_india.geojson", "r"))
state_id_map = {}
for feature in india_geodata["features"]:
feature["id"] = feature["properties"]["state_code"]
state_id_map[feature["properties"]["st_nm"]] = feature["id"]
def generate_map (csv_name, title):
df = pd.read_csv("data/" + csv_name)
df["Interest"] = df["Interest"].fillna(0)
df.loc[df['State'] == "Delhi", 'State'] = "NCT of Delhi"
df.loc[df['State'] == "Dadra and Nagar Haveli", 'State'] = "Dadara & Nagar Havelli"
df.loc[df['State'] == "Andaman and Nicobar Islands", 'State'] = "Andaman & Nicobar Island"
df.loc[df['State'] == "Jammu and Kashmir", 'State'] = "Jammu & Kashmir"
df.loc[df['State'] == "Daman and Diu", 'State'] = "Daman & Diu"
df.loc[df['State'] == "Arunachal Pradesh", 'State'] = "Arunanchal Pradesh"
df["id"] = df["State"].apply(lambda x: state_id_map[x])
fig = px.choropleth(df, geojson=india_geodata, locations='id', color='Interest',
color_continuous_scale="Blues",
range_color=(0, 100),
hover_name="State",
hover_data=["Interest"],
title=title,
)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":5,"t":10,"l":5,"b":10})
return fig
fig = generate_map("geoMap.csv", "title")
fig.show()